-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[FLINK-38444][table] Join unique keys should not contain null values #27090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
assertEquals( | ||
toBitSet(Array(1), Array(1, 5), Array(1, 5, 6)), | ||
mq.getUpsertKeys(logicalLeftJoinOnUniqueKeys).toSet) | ||
assertEquals(toBitSet(Array(1)), mq.getUpsertKeys(logicalLeftJoinOnUniqueKeys).toSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am i missing something, I do not see tests for nulls on each side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is the test: 5, and 6 were columns coming from the right side of the join. So they can't be included in unique keys adn that's why "Array(1, 5), Array(1, 5, 6)" were remove as possible unique keys.
The same for all the other tests that were updated, they test the new behaviour
CI is failling on unrelated e2e tests issue happening across PRs |
Hi, @gustavodemorais. I'm wondering why we can't have null columns as part of unique keys? From what I see in the SQL standard, it seems that unique keys can contain null columns.
|
Hey @xuyangzhong, thanks for the comment. I was at Flink Forward and just reading your reply now. That's a good catch. The issue is that we are using the unique keys as upsert keys and in my understand we can't allow upsert keys to contain null values and that's what I was trying to fix here. It leads to a runtime error since a sink expects the upsert key not to contain a null value which can happen in a case of a left join. https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/metadata/FlinkRelMdUpsertKeys.scala#L352 We could also have additional logic in getJoinUpsertKeys to remove such keys. Do you have other suggestions? |
@gustavodemorais IIUC, currently there is no explicit handling of null values for the upsert key from the planning stage (where it originates from the unique key, which is not restricted from containing null values) to runtime (where the operators do not include any special treatment for columns that may contain null values in the upsert key). In other words, there are no restrictions preventing the upsert key from containing null values. I’m curious about the issue you mentioned. Could you explain it in more detail?
|
Sure. If we create two tables and apply a left join, it might happen that the upsert key contains nullable columns, like in the examples in the tests. We can then create a third table with a primary key and submit an Here are simple repro steps
Two things happen
Ideally we would fail this during planning. Now, I still think that by definition upsert keys cannot contain null values - do you agree with that? At the same time, while reproducing this I realized the runtime issue might not be solved by addressing nullable upsert keys. What do you think? |
@gustavodemorais I agree with your point that we should ideally find this unexpected behavior during the planning phase. In fact, I believe there's a fundamental issue with how this sql is written. When using a LEFT JOIN, one must be aware that fields from the right table may be null. If the sink table defines some fields from the right table as part of the combined primary key—which must not contain null values—then it's reasonable to encounter an error stating "null attempted to be written to not null." Moreover, this issue is not related to sink upsert materialization, which is intended to address out-of-order problems. Under this sql, out-of-order issues aren't present, so I ran the SQL on MySQL, and I received a similar error regarding null being written to the primary key.
|
What is the purpose of the change
The planner currently considers the union of both the unique and upsert keys from the left and from the right to be a valid resulting upsert key. That's true for inner joins but for left/right/full joins that leads to a resulting unique key that contains columns that can be null, which is not valid.
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation